Search CORE

40 research outputs found

Multi-Level Trace Abstraction, Linking and Display

Author: Ezzati Jivan Naser
Publication venue
Publication date: 01/04/2014
Field of study

RÉSUMÉ Certains types de problèmes logiciels et de bogues ne peuvent être identifiées et résolus que lors de l'analyse de l'exécution des applications. L'analyse d'exécution (et le débogage) des systèmes parallèles et distribués est très difficile en utilisant uniquement le code source et les autres artefacts logiciels statiques. L'analyse dynamique par trace d'exécution est de plus en plus utilisée pour étudier le comportement d'un système. Les traces d'exécution contiennent généralement une grande quantité d'information sur l'exécution du système, par exemple quel processus/module interagit avec quels autres processus/modules, ou encore quel fichier est touché par celui-ci, et ainsi de suite. Les traces au niveau du système d'exploitation sont un des types de données des plus utiles et efficaces qui peuvent être utilisés pour détecter des problèmes d'exécution complexes. En effet, ils contiennent généralement des informations détaillées sur la communication inter-processus, sur l'utilisation de la mémoire, le système de fichiers, les appels système, la couche réseau, les blocs de disque, etc. Cette information peut être utilisée pour raisonner sur le comportement d'exécution du système et investiguer les bogues ainsi que les problèmes d'exécution. D'un autre côté, les traces d'exécution peuvent rapidement avoir une très grande taille en peu de temps à cause de la grande quantité d'information qu'elles contiennent. De plus, les traces contiennent généralement des données de bas niveau (appels système, interruptions, etc ) pour lesquelles l'analyse et la compréhension du contexte requièrent des connaissances poussées dans le domaine des systèmes d'exploitation. Très souvent, les administrateurs système et analystes préfèrent des données de plus haut niveau pour avoir une idée plus générale du comportement du système, contrairement aux traces noyau dont le niveau d'abstraction est très bas. Pour pouvoir générer efficacement des analyses de plus haut niveau, il est nécessaire de développer des algorithmes et des outils efficaces pour analyser les traces noyau et mettre en évidence les événements les plus pertinents. Le caractère expressif des événements de trace permet aux analystes de raisonner sur l'exécution du système à des niveaux plus élevés, pour découvrir le sens de l'exécution différents endroits, et détecter les comportements problématiques et inattendus. Toutefois, pour permettre une telle analyse, un outil de visualisation supplémentaire est nécessaire pour afficher les événements abstraits à de plus hauts niveaux d'abstraction. Cet outil peut permettre aux utilisateurs de voir une liste des problèmes détectés, de les suivre dans les couches de plus bas niveau (ex. directement dans la trace détaillée) et éventuellement de découvrir les raisons des problèmes détectés. Dans cette thèse, un cadre d'application est présenté pour relever ces défis : réduire la taille d'une trace ainsi que sa complexité, générer plusieurs niveaux d'événements abstraits pour les organiser d'une facon hiérarchique et les visualiser à de multiples niveaux, et enfin permettre aux utilisateurs d'effectuer une analyse verticale et à plusieurs niveaux d'abstraction, conduisant à une meilleure connaissance et compréhension de l'exécution du système. Le cadre d'application proposé est étudié en deux grandes parties : d'abord plusieurs niveaux d'abstraction de trace, et ensuite l'organisation de la trace à plusieurs niveaux et sa visualisation. La première partie traite des techniques utilisées pour les données de trace abstraites en utilisant soit les traces des événements contenus, un ensemble prédéfini de paramètres et de mesures, ou une structure basée l'abstraction pour extraire les ressources impliquées dans le système. La deuxième partie, en revanche, indique l'organisation hiérarchique des événements abstraits générés, l'établissement de liens entre les événements connexes, et enfin la visualisation en utilisant une vue de la chronologie avec échelle ajustable. Cette vue affiche les événements à différents niveaux de granularité, et permet une navigation hiérarchique à travers différentes couches d'événements de trace, en soutenant la mise a l'échelle sémantique. Grâce à cet outil, les utilisateurs peuvent tout d'abord avoir un aperçu de l'exécution, contenant un ensemble de comportements de haut niveau, puis peuvent se déplacer dans la vue et se concentrer sur une zone d'intérêt pour obtenir plus de détails sur celle-ci. L'outil proposé synchronise et coordonne les différents niveaux de la vue en établissant des liens entre les données, structurellement ou sémantiquement. La liaison structurelle utilise la délimitation par estampilles de temps des événements pour lier les données, tandis que le second utilise une pré-analyse des événements de trace pour trouver la pertinence entre eux ainsi que pour les lier. Lier les événements relie les informations de différentes couches qui appartiennent théoriquement à la même procédure ou spécifient le même comportement. Avec l'utilisation de la liaison de la correspondance, des événements dans une couche peuvent être analysés par rapport aux événements et aux informations disponibles dans d'autres couches, ce qui conduit à une analyse à plusieurs niveaux et la compréhension de l'exécution du système sous-jacent. Les exemples et les résultats expérimentaux des techniques d'abstraction et de visualisation proposés sont présentes dans cette thèse qui prouve l'efficacité de l'approche. Dans ce projet, toutes les évaluations et les expériences ont été menées sur la base des événements de trace au niveau du système d'exploitation recueillies par le traceur Linux Trace Toolkit Next Generation ( LTTng ). LTTng est un outil libre, léger et à faible impact qui fournit des informations d'exécution détaillées à partir des différents modules du système sous-jacent, tant au niveau du noyau Linux que de l'espace utilisateur. LTTng fonctionne en instrumentant le noyau et les applications utilisateur en insérant quelques points de traces à des endroits différents.----------ABSTRACT Some problems and bugs can only be identified and resolved using runtime application behavior analysis. Runtime analysis of multi-threaded and distributed systems is very difficult, almost impossible, by only analyzing the source code and other static software artifacts. Therefore, dynamic analysis through execution traces is increasingly used to study system runtime behavior. Execution traces usually contain large amounts of valuable information about the system execution, e.g., which process/module interacts with which other processes/modules, which file is touched by which process/module, which function is called by which process/module/function and so on. Operating system level traces are among the most useful and effective information sources that can be used to detect complex bugs and problems. Indeed, they contain detailed information about inter-process communication, memory usage, file system, system calls, networking, disk blocks, etc. This information can be used to understand the system runtime behavior, and to identify a large class of bugs, problems, and misbehavior. However, execution traces may become large, even within a few seconds or minutes of execution, making the analysis difficult. Moreover, traces are often filled with low-level data (system calls, interrupts, etc.) so that people need a complete understanding of the domain knowledge to analyze these data. It is often preferable for analysts to look at relatively abstract and high-level events, which are more readable and representative than the original trace data, and reveal the same behavior but at higher levels of granularity. However, to achieve such high-level data, effective algorithms and tools must be developed to process trace events, remove less important ones, highlight only necessary data, generalize trace events, and finally aggregate and group similar and related events. The expressive nature of the synthetic events allows analysts to reason about system execution at higher levels, to uncover execution behavior in different areas, and detect its problematic and unexpected aspects. However, to allow such analysis, an additional visualization tool may be required to display abstract events at different levels and explore them easily. This tool may enable users to see a list of detected problems, follow the problems in the detailed levels (e.g., within the raw trace events), analyze, and possibly discover the reasons for the detected problems. In this thesis, a framework is presented to address those challenges: to reduce the execution trace size and complexity, to generate several levels of abstract events, to organize the data in a hierarchy, to visualize them at multiple levels, and finally to enable users to perform a top-down and multiscale analysis over trace data, leading to a better understanding and comprehension of underlying system execution. The proposed framework is studied in two major parts: multi-level trace abstraction, and multi-level trace organization and visualization. The first part discusses the techniques used to abstract out trace data using either the trace events content, a predefined set of metrics and measures, or structure-based abstraction to extract the resources involved in the system execution. The second part determines the hierarchical organization of the generated abstract events, establishes links between the related events, and finally visualizes events using a zoomable timeline view. This view displays the events at different granularity levels, and enables a hierarchical navigation through different layers of trace events by supporting the semantic zooming. Using this tool, users can first see an overview of the execution, and then can pan around the view, and focus and zoom on any area of interest for more details and insight. The proposed view synchronizes and coordinates the different view levels by establishing links between data, structurally or semantically. The structural linking uses bounding times-tamps of the events to link the data, while the latter uses a pre-analysis of the trace events to find their relevance, and to link them together. Establishing Links connects the information that are conceptually related together ( e.g., events belong to the same process or specify the same behavior), so that events in one layer can be analyzed with respect to events and information in other layers, leading to a multi-level analysis and a better comprehension of the underlying system execution. In this project, all evaluations and experiments were conducted on operating system level traces obtained with the Linux Trace Toolkit Next Generation (LTTng). LTTng is a lightweight and low-impact open source tracing tool that provides valuable runtime information from the various modules of the underlying system at both the Linux kernel and user-space levels. LTTng works by instrumenting the (kernel and user-space) applications, with statically inserted tracepoints, and by generating log entries at runtime each time a tracepoint is hit

PolyPublie

A Stateful Approach to Generate Synthetic Events from Kernel Traces

Author: Michel R. Dagenais
Naser Ezzati-Jivan
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

Multi-scale navigation of large trace data: A survey

Author: Dagenais Michel R.
Ezzati-Jivan Naser
Publication venue: 'Wiley'
Publication date: 25/05/2017
Field of study

Dynamic analysis through execution traces is frequently used to analyze the runtime behavior of software systems. However, tracing long running executions generates voluminous data, which are complicated to analyze and manage. Extracting interesting performance or correctness characteristics out of large traces of data from several processes and threads is a challenging task. Trace abstraction and visualization are potential solutions to alleviate this challenge. Several efforts have been made over the years in many subfields of computer science for trace data collection, maintenance, analysis, and visualization. Many analyses start with an inspection of an overview of the trace, before digging deeper and studying more focused and detailed data. These techniques are common and well supported in geographical information systems, automatically adjusting the level of details depending on the scale. However, most trace visualization tools operate at a single level of representation, which are not adequate to support multilevel analysis. Sophisticated techniques and heuristics are needed to address this problem. Multi-scale (multilevel) visualization with support for zoom and focus operations is an effective way to enable this kind of analysis. Considerable research and several surveys are proposed in the literature in the field of trace visualization. However, multi-scale visualization has yet received little attention. In this paper, we provide a survey and methodological structure for categorizing tools and techniques aiming at multi-scale abstraction and visualization of execution trace data and discuss the requirements and challenges faced to be able to meet evolving user demands

PolyPublie

A framework to compute statistics of system parameters from very large trace files

Author: Dagenais Michel R.
Ezzati-Jivan Naser
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

In this paper, we present a framework to compute, store and retrieve statistics of various system metrics from large traces in an efficient way. The proposed framework allows for rapid interactive queries about system metrics values for any given time interval. In the proposed framework, efficient data structures and algorithms are designed to achieve a reasonable query time while utilizing less disk space. A parameter termed granularity degree (GD) is defined to determine the threshold of how often it is required to store the precomputed statistics on disk. The solution supports the hierarchy of system resources and also different granularities of time ranges. We explain the architecture of the framework and show how it can be used to efficiently compute and extract the CPU usage and other system metrics. The importance of the framework and its different applications are shown and evaluated in this paper

PolyPublie

Efficient methods for trace analysis parallelization

Author: Dagenais Michel R.
Ezzati-Jivan Naser
Reumont-Locke Fabien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/02/2019
Field of study

Tracing provides a low-impact, high-resolution way to observe the execution of a system. As the amount of parallelism in traced systems increases, so does the data generated by the trace. Most trace analysis tools work in a single thread, which hinders their performance as the scale of data increases. In this paper, we explore parallelization as an approach to speedup system trace analysis. We propose a solution which uses the inherent aspects of the CTF trace format to create balanced and parallelizable workloads. Our solution takes into account key factors of parallelization, such as good load balancing, low synchronization overhead and an efficient resolution of data dependencies. We also propose an algorithm to detect and resolve data dependencies during trace analysis, with minimal locking and synchronization. Using this approach, we implement three different trace analysis programs: event counting, CPU usage analysis and I/O usage analysis, to assess the scalability in terms of parallel efficiency. The parallel implementations achieve parallel efficiency above 56% with 32 cores, which translates to a speedup of 18 times the serial speed, when running the parallel trace analyses and using trace data stored on consumer-grade solid state storage devices. We also show the scalability and potential of our approach by measuring the effect of future improvements to trace decoding on parallel efficiency

PolyPublie

Efficient cloud tracing: From very high level to very low level

Author: Bationo Yves J.
Dagenais Michel R.
Ezzati-Jivan Naser
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/03/2018
Field of study

With the increase of cloud infrastructure complexity, the origin of service deterioration is difficult to detect because issues may occur at the different layer of the system. We propose a multi-layer tracing approach to gather all the relevant information needed for a full workflow analysis. The idea is to collect trace events from all the cloud nodes to follow users' requests from the cloud interface to their execution on the hardware. Our approach involves tracing OpenStack's interfaces, the virtualization layer, and the host kernel space to perform analysis and show abnormal tasks and the main causes of latency or failures in the system. Experimental results about virtual machines live migration confirm that we are able to analyse services efficiency by locating platforms' weakest links

Crossref

PolyPublie

A stateful approach to generate synthetic events from kernel traces

Author: Dagenais Michel R.
Ezzati-Jivan Naser
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

We propose a generic synthetic event generator from kernel trace events. The proposed method makes use of patterns of system states and environment-independent semantic events rather than platform-specific raw events. This method can be applied to different kernel and user level trace formats. We use a state model to store intermediate states and events. This stateful method supports partial trace abstraction and enables users to seek and navigate through the trace events and to abstract out the desired part. Since it uses the current and previous values of the system states and has more knowledge of the underlying system execution, it can generate a wide range of synthetic events. One of the obvious applications of this method is the identification of system faults and problems that will appear later in this paper. We will discuss the architecture of the method, its implementation, and the performance results

Directory of Open Access Journals

PolyPublie

System and Application Performance Analysis Patterns Using Software Tracing

Author: Ezzati-Jivan Naser
Ghosh Sauradip
Hamou-Lhadj Abdelwahab
Publication venue
Publication date: 22/04/2022
Field of study

Software systems have become increasingly complex, which makes it difficult to detect the root causes of performance degradation. Software tracing has been used extensively to analyze the system at run-time to detect performance issues and uncover the causes. There exist several studies that use tracing and other dynamic analysis techniques for performance analysis. These studies focus on specific system characteristics such as latency, performance bugs, etc. In this thesis, we review the literature to build a catalogue of performance analysis patterns that can be detected using trace data. The goal is to help developers debug run-time and performance issues more efficiently. The patterns are formalized and implemented so that they can be readily referred to by developers while analyzing large execution traces. The thesis focuses on the traces of system calls generated by the Linux kernel. This is because no application is an island and that we cannot ignore the complex interactions that an application has with the operating system kernel if we are to detect potential performance issues

Concordia University Research Repository

Dynamic trace-based sampling algorithm for memory usage tracking of enterprise applications

Author: Dagenais Michel R.
Daoud Houssem
Ezzati-jivan Naser
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/11/2017
Field of study

Excessive memory usage in software applications has become a frequent issue. A high degree of parallelism and the monitoring difficulty for the developer can quickly lead to memory shortage, or can increase the duration of garbage collection cycles. There are several solutions introduced to monitor memory usage in software. However they are neither efficient nor scalable. In this paper, we propose a dynamic tracing-based sampling algorithm to collect and analyse run time information and metrics for memory usage. It is implemented as a kernel module which gathers memory usage data from operating system structures only when a predefined condition is set or a threshold is passed. The thresholds and conditions are preset but can be changed dynamically, based on the application behavior. We tested our solutions to monitor several applications and our evaluation results show that the proposed method generates compact trace data and reduces the time needed for the analysis, without loosing precision

Crossref

PolyPublie

Efficient large-scale heterogeneous debugging using dynamic tracing

Author: Dagenais Michel R.
Ezzati-Jivan Naser
Nadeau Didier
Publication venue: 'Elsevier BV'
Publication date: 01/09/2019
Field of study

Heterogeneous multi-core and many-core processors are increasingly common in personal computers and industrial systems. Efficient software development on these platforms needs suitable debugging tools, beyond traditional interactive debuggers. An alternative, to interactively follow the execution flow of a program, is tracing within the debugging environment, as long as the tracer has a minimal overhead. In this paper, the dynamic tracing infrastructure of GNU debugger (GDB) was investigated to understand its performance limitations. Thereafter, we propose an improved architecture for dynamic tracing on many-core processors within GDB, and demonstrate its scalability on highly parallel platforms. In addition, the scalability of the thread data collection and presentation component was studied and new views were proposed within the Eclipse Debugging Service Framework and the Trace Compass visualization tool. With these scalability enhancements, debuggers such as GDB can more efficiently help debugging multi-threaded programs on heterogeneous many-core processors composed of multi-core CPUs, and GPUs containing thousands of cores

PolyPublie